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Abstract 

This paper deals with the weak continuity, Fisher-consistency and differentiability of estimating 
functionals corresponding to a class of both linear and nonlinear regression high breakdown M estimates, 
which includes S and MM estimates. A restricted type of differentiability, called weak differentiability, 
is defined, which suffices to prove the asymptotic normality of estimates based on the functionals. This 
approach allows to prove the consistency, asymptotic normality and qualitative robustness of estimates 
under more general conditions than those required in standard approaches. 
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1 Introduction 

We consider estimation in the regression model with random predictors 

yi = g(xi,P ) + Ui, (1) 

with data (xi,y i) <E R p x R, i = 1, n; where /?o € B C R q is a vector of unknown parameters, g(x, (3) is a 
known function continuous in /3, and for each i, Xi and Ui are independent. It is assumed that {(xi, yi) , i > 1} 
are identically distributed but not necessarily independent. The well-known fact that the least squares (LS) 
estimate of /3o is sensitive to atypical observations has motivated the development of robust estimates. 

An important class of robust estimators are the M estimates. Inside this class we can distinguish the 
S estimates introduced by Rousseeuw and Yohai (1984) and the MM estimates proposed by Yohai (1987). 
For linear regression, S estimates may attain the highest possible breakdown point, and MM estimates may 
combine the highest possible breakdown point with a high normal efficiency; see e.g. (Maronna, Martin 
and Yohai (2006), Chapter 5). In the case of nonlinear regression MM estimates may also combine high 
breakdown point with high normal efficiency. In fact, the normal efficiency of these estimates can be made 
as close to one as desired, and Monte Carlo simulations in Fasano (2009) show them to have a highly robust 
behavior for some nonlinear models. 

In the nonlinear case, Fraiman (1983) study bounded influence estimates for nonlinear regression. Sakata 
and White (2001) deal with S estimates for nonlinear regression models with dependent observations; Vainer 
and Kukush (1998) and Liese and Vajda (2003, 2004) deal with M estimates with fixed scale and therefore 
no scale equivariant . The latter study the y^n-consistency of M estimates in more general models, which 
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include linear and nonlinear regression with independent observations. Stromberg (1995) proved the weak 
consistency of the least median of squares (LMS) estimate, and Cizek (2005) dealt with the consistency and 
the asymptotic normality of the least trimmed squares (LTS) estimate under dependency. 

Three important qualitative features of these estimates are consistency, asymptotic normality and qual- 
itative robustness. These properties have been studied in the literature through specific approaches. Yohai 
(1987) proved these properties for MM estimates in the i.i.d. linear case, and Fasano (2009) proved them in 
the nonlinear case, both assuming symmetrically distributed u^s. 

In this work we propose an alternative approach, based on the representation of the estimates as Junc- 
tionals on distributions (Hampel 1971). For a large class of estimates, which includes M estimates, one can 
define a functional T (G) on the space of data distributions, such that if G„ is the empirical distribution, 
then T (G n ) is the estimate, and if Go is the underlying distribution, then T (Go) is the parameter that we 
want to estimate. The weak continuity of the functional T simplifies the proof of consistency of T(G n ) and 
some suitable forms of differentiability of T, as Frechet or Hadamard differentiability, allow simple proofs 
of the asymptotic normality of the estimate under very general conditions. These results hold without the 
requirement that G„ be the empirical distribution of a sequence of i.i.d. random variables: if we want to 
estimate T(Go), it suffices that G„ converges weakly to Go a.s.. The weak continuity of M functionals at a 
general statistical model were studied by Clarke (1983 and 2000). Frechet differentiability was studied by 
Boos and Scrfling (1980) and Clarke (1983), and Hadamard differentiability by Fernholz (1983). In all of 
these works, it is required that the score function used for the M estimate be bounded, and therefore their 
results can not be applied to regression M estimates. In this paper we prove under very general conditions 
that the functionals associated to M estimates of regression are weakly continuous. Besides, since the usual 
forms of differentiability , like Frechet or Hadamard differentiability, require in the case of M estimates the 
boundedness of the score functions, we introduce a new concept of differentiability, that we call weak dif- 
ferentiability, which is satisfied for high breakdown M estimates of regression, e.g., by S and MM estimates, 
and which is adequate to prove the asymptotic normality of these estimates. 

This work is organized as follows: In Section [2] we define the estimates to be considered and in Sections 
|3l 0] and [5] we shall respectively deal with the continuity, the Fisher-consistency, and differentiability of the 
functionals corresponding to the estimates defined above. These results will be shown to imply the con- 
sistency, qualitative robustness and asymptotic normality of the estimates under assumptions more general 
than the i.i.d. model and without the requirement of symmetric errors. In Section [S] we apply the results 
obtained in the former Sections to MM estimates. Finally Section [7] contains all proofs. 

2 Definitions of estimates 

We first define our notation. Henceforth Eg[/i(z)] and Pg (A) will respectively denote the expectation of 
h(z) and the probability that z £ A, when z is distributed according to G. If z has distribution G we 
write z ~ G or T> (z) = G. Weak convergence of distributions, convergence in probability and convergence 
in distribution of random variables or vectors are denoted by G„ — > w G, z n — > p z and z n — >d z, respectively. 
By an abuse of notation, we will write z n — > c [ G to denote T> (z„ ) — > w G. The complement and the indicator 
of the set A are denoted by A c and 1a, respectively. The scalar product of vectors a and b is denoted by 
a'b, and R + denotes the set of positive real numbers. 

Before proceeding further, we need to clarify an important detail. If it is not assumed that the errors have 
a symmetric distribution, the standard treatment of regression estimates requires some condition related to 
the "centering" of the u, to ensure the identifiability of all parameters and the consistency of the estimates. 
For LS, this condition is F,Ui = 0. For M estimates it is (ui/a) = 0, where i/j is the score function and 
a is an error scale; the fact that this assumption depends on <x, which is an unknown parameter, makes 
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it undesirable. Since we want our results to hold under more general assumptions, we will employ another 
(somewhat nonstandard) approach to identify /3q. Note first that in the linear case, if there is a constant term, 
the slopes are always identifiable no matter the distribution of Ui, but the intercept is unidentified without 
some centering assumption on Ui, such as zero median. For these reasons, besides /3o, our M estimates will 
include an additional additive term a. If the model docs contain an intercept, then a will single it out, and g 
will be redefined as the "non-intercept" part of the model. Otherwise, a may be interpreted as a "centering 
constant" for Ui. In general, a remains unidentified; if it has to be identified (e.g. for prediction) then some 
assumption on the centering of Ui must be added to the model. 

Instead of a centering condition we will require the following identifiability condition: 

P(g(xi,po)=g(xi,P) + a)<l V/5^„,Va. (2) 

Otherwise model ((T|) might also hold with (3 instead of /?o and Ui + a instead of Uj. In the linear case 
g{x, P) = (3'x this condition means that g does not include an intercept and Xi is not concentrated on any 
hyperplane. 

Now in order to get consistent estimators for (3q our estimates must always contain a term which plays 
the role of an intercept. Let henceforth £ = (/?', a)' with a £ R, and define the function 

9{x,Ci = g{x,0) +a. 

M estimates are then defined as 

n 

Im = arg min Vp 

£<=BxR z — ' 
i—1 

where a is a robust residual scale and p is a loss function. 

To define S estimates we need an M scale S(r). Given r = (ri, ...,r„)', S(r) is defined as 

n — ' V a ) 

i=l 

where po is another loss function and the constant S regulates the estimate's robustness. 
Then, S estimates of regression are defined by 

£s =arg niin S (r (£)) , 

£<EBxR 

where r (£) is the residual vector with elements r^(£) = yi—g(xi,^) . 

In particular we will consider with some detail the subclass of MM estimates. These estimates are 
defined by ([3]) with a obtained from an S estimate, namely 

° = ™ in „ S ( r (0) (6) 

with p < po . Yohai (1987) showed that in case of linear regression the asymptotic breakdown point of 
MM estimates with S = 0.5 is 0.5 if V{j3'xi + a = 0) =0 for all (3^0, and that, simultaneously, it is 
possible to choose p so that the corresponding MM estimate yields an arbitrarily high efficiency when the 
errors are Gaussian. 

Now in order to state our results, we must first express the already defined M and S estimates as 
functionals. Throughout this article loss functions will be bounded p-functions, in the following sense. 



foii -g (XuO j 



the solution a 
(4) 
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Definition 1 A bounded p-function is a function p(t) that is a continuous nondecreasing function of \t\, 
such that p(0) = 0, p(oo) = 1, and p (v) < 1 implies that p{u) < p(v) for \u\ < \v\. 



Then, in the rest of the paper we will assume the following property 
RO. p and p are "bounded p-functions." 
Define the residual scale functional 5*(G, £) by 

for S 6 (0, 1). Then the regression S functional Ts and the associated error scale M functional 5(G) are 
respectively defined by 

T S (G) := (T S)/3 (G),T S)Q (G)) = arg min 5*(G,£) (8) 

f;G-Bx.R 



and 



5(G) = min S*(G,S). (9) 



We will deal with a regression M functional Tm(G) defined as 



T M (G) := (T M ,0(G),T M ,a(G)) = arg min M G (0, (10) 
where the function Mgr : _B x R — > i? is 

M G «)^E G ,(^!)) (11) 

and 5(G) is an arbitrary residual scale functional, for example the one defined in (|9]). 

It is easy to show that the S regression functional defined in ([8]) is also an M functional. In fact Tg(G) 
coincides with Tm(G) when in (fTTj) we have p = po and 5(G) = 5(G). We may then write 

Ib(G)=axg A E G p (^^). (12) 

Remark 1 In general, the minimum at or ilO\) might be attained at more than one value of £. It will 
be henceforth assumed that the functional is well-defined by the choice of a single value. Our results will 
not depend on how the choice is made. However, it will be shown in Section [7] that under very general 
conditions, if Go is the distribution of (x,y) satisfying (QP ; then Tq(Go) and Tm(Gq) are unique and 
Ts,p{Go) =Tm,p(Gq) — Po (Fisher-consistency). 



3 Weak continuity of M and S regression functionals 

We will show the weak continuity of the functionals defined above in two cases: nonlinear regression with a 
compact parameter space B, and linear regression. 
Define for G = V (x, y) 

c(G) = sup{P G (^'a; + a = 0) : (3 e R p , a E R, [3 ^ 0}. (13) 
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Theorem 1 Let Go = T>{x,y) be such that UO\) has a unique solution Tm(Go). Assume that S is weakly 
continuous at Go and S(Gq) > 0. Then Tm = (Jm^jTm^) is weakly continuous at Go if either (a) B is 
compact, or (b) B = R p , g(x,/3) = /3'x and 

M Go (T M (G ))<l-c(G ). (14) 

Theorem 2 Let Gq = T>(x,y) be such that Tg(Go) is unique and S(Go) > 0. Assume that either (a) B is 
compact, or (b) B = RP , g is linear, i.e., g(x,j3) = /3'x and S < 1 — c(Go) with c{G) defined in M3\) . Then 
S(G) and Ts(G) = (Ts,p,Ts, a ) are weakly continuous at Go. 

Let now Go be the distribution of (x, y) under model ([T]), and assume that Tm (respectively Xg) is Fisher- 
consistent for fto, i.e., Tm,/3 (Go) = /?o (respectively Ts ;/ g (Go) = Aj)- Then the former results imply that Tm,/3 
(respectively 7s,^) evaluated at the empirical distribution is consistent whenever the empirical distributions 
converge to the underlying one. More precisely, we have the following result: 

Corollary 1 Assume the same hypotheses as in Theorem [7] (respectively Theorem 0) plus the Fisher- 
consistency of Tm (respectively T$): Tm b(Go) = Tg s(Go) = /3q. Call G n the empirical distribution of 
{(xi,yi) : i = l,...,n}. If G n — > w Go a.s., then {TM,p(G n )} (respectively {Ts t p(G n )}) is strongly consistent 
for (Sq- 

This result is immediate. The a.s. weak convergence of G n to Go is well-known to hold for i.i.d. 
(xi,yi) (see e.g. (Billingsley 1999, Problem 3.1)). It holds also under more general assumptions on the joint 
distribution of {(xs, j/,) : i > 1}, such as ergodicity. 

We now turn to qualitative robustness. Consider a sequence of estimates {£ ra } based on a functional T, 
i.e. = T (G n ) where G n is the empirical distribution corresponding to data (zi,...,z n ) . Hampel (1971) 
proved that for {£„} to be qualitatively robust at a distribution Go it suffices that T be weakly continuous 
at Go and be a continuous function of (z\, z n ) , 

Papantoni-Kazakos and Grey (1979) employ a weaker definition of robustness, which they call asymptotic 
qualitative robustness, and prove that it is equivalent to weak continuity. Therefore Theorems [T] and [5] imply 
the asymptotic qualitative robustness of Tm and Tg. 

4 Fisher— consistency of M and S estimates 

In this Section we give sufficient conditions to guarantee that both (|5J| and (|10p arc minimized at unique 
values, and to guarantee the Fisher consistency for /3 . 

Recall that a density / is strongly unimodal if there exists a such that / (t) is nondecreasing for t < a, 
nonincreasing for t > a, and / has a unique maximum at t = a. 

Theorem[3]is an auxiliary result, which is a small variation of one given by Mizera (1993). We will need 
the following condition on p 

Rl. For some m, p(u) = 1 iff \u\ > m, and log(l — p) is concave on (— m, m) 

Theorem 3 Let p satisfy Condition Rl and let F be a distribution with a strongly unimodal density f. 
Then (a) there exists t Q such that 

q(t)=E F p(u-t) (15) 
has a unique minimum at to; (b) if F is symmetric around po-, then to=po- 
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It is easy to check that condition Rl with m = k holds in particular for the popular family of bisquare 
functions, defined by 

Pk {u) = l-(l-{^f\ I{\u\<k). 

We will establish the Fisher-consistency of Tm. Put for brevity a = S (Go) and let To be the distribution 
of Ui in ([TJ and assume that it has a strongly unimodal density. Let A denotes the unique minimizer of 
E_f ,o((w — t) /a); note that if Ui is symmetric around fio, then part (b) of Theorem [3] implies that A = /xq- 

Theorem 4 Let Go be the joint distribution of (xi,yi) satisfying model flj), where Ui has distribution 
Fq with a strongly unimodal density. Assume that the identifiability condition (0) and condition Rl hold. 
Then, M Go (£) * s minimized at the unique point Tm (Go) = (/3o, A), and so Tm is Fisher-consistent for (3q, 
i.e., Tm,/3(Gq) = /?o- If we also assume that Fq is symmetric around fig, we have Tm, q (Go) = /-to- 

Remark 2 Theorem gives also sufficient conditions for the Fisher-consistency of the regression S func- 
tional Tg . In fact, according to {U|), Tg is also an M functional. 



5 Differentiability of estimating functionals 

In this Section we shall first deal with the differentiability of general functionals and then specialize to our 
regression case. Let Qh be a set of distributions on Br. Consider an estimating functional T : Qh —> R k ■ 
Hampcl (1976) defines the influence function of T at G € Qh as the function It.g( z ) '■ R h R k 



d(T((l-e)G + e5 z ) 



ds 



(16) 

:=0 



where S z is the point mass distribution at z. Given a distance d on Qh which metricizes the topology of 
convergence in distribution, T is Frechet differentiable at Go under d if 

T(G) - T(G ) = E G I TlGo (z) + o(d(G, G )). 

Frechet differentiability can be used to prove the asymptotic normality of the estimate. However, Frechet 
differentiability also requires that It,g{z) be bounded. Since this condition is not satisfied by regression 
M estimates, we are going to define a weaker type of differentiability, which suffices to prove asymptotic 
normality. 

Definition 2 LetT be an estimating functional that is weakly continuous at Go, and consider a sequence 
{G n } such that G n Go- We say that T is weakly differentiable at {G n } if 

T(G n ) -T(G ) = E G jT,Go(z)+o(m G jT,G (z)\\) ■ (17) 

The definition of weak differentiability helps understanding the asymptotic behavior of T(G n ) — T(Gq), 
as the next Lemma shows. 

Lemma 1 Consider a random sequence of distributions {G„} converging weakly to Go a.s. Suppose that T 
is weakly differentiable at {G n } a.s. and that for some sequence {a n } of real numbers 

anE G „/ T ,Go( 2 ) -^d H - 

Then 

a„(T(G„) - T(G )) = a„E G „/ T , Go (z)+o p (l). (18) 
and therefore a n (T(G n ) — T(Gq)) — >d H too. 



6 



The proof of this Lemma is immediate. 

Remark. Note that if (fT5)) holds for a joint functional T = (Ti,T2), it also holds for Ti, i.e., 

a n {Ti{G n ) - Ti(Go)) = a n E G J TuGo (z)+o p (l). (19) 

We now deal with the differentiability of a general M estimating functional, i.e., a functional T defined 
on a subset of Q v with values in R q , that for some function if? : R p x R q — > R q satisfies the equation 

E G *(z,T(G)) = 0. (20) 

We will assume that 'f? is continuously differcntiable with respect to 9 and call ^{z,9) (or alternatively 
d^f? (z , 9) / 89) the qx q differential matrix with elements ^jk(z,6) = d i f?j(z i 0)/d0k- Define 

D(G,e) = E G i?{z,e). (21) 

Let 8 = T(Go) and assume that 

D = D(G ,6 ) (22) 

exists. Suppose that Dq is nonsingular, that T is weakly continuous at Go and that there exists rj > such 
that 

E Go sup 1 1* (z, 0)| | <oo, (23) 
\\e-e \\<ri 

where ||.|| denotes the I2 norm. Then, it is easy to show that the influence function of T at Go is given by 

It,g {z) = -D^{z,6 )- (24) 
The following conditions are sufficient for the weak differentiability of T at {G„}. 
Condition 1 {G n } is a sequence of distribution functions that converges weakly to Go and 



HmHmsup sup \\D(G n , 9) - D Q \\ = 0. (25) 

-0a\\<T] 



n—>oo 



Condition 2 {G„} is a sequence of distribution functions such that, at a neighborhood of 9q, for each n 

d d 

—E G ^(z,8)=E Gn —^(z,9). (26) 

Theorem 5 Assume that T is an M functional satisfying h20\) and weakly continuous at Go, that if?(z,9) is 
continuous in 9, Dq is non singular and there exists 77 > such that (23\) holds. Suppose that {G n } satisfies 
Condition^ and Condition^ then T is weakly differ entiable at {G n }. 

The following Theorem gives sufficient conditions for a.s. differentiability of M functionals, at a random 
sequence of distributions. In particular, it includes the case where G n are the empirical distributions of 
observations corresponding to an ergodic process. 

Theorem 6 Let {G n } be a sequence of random distribution converging weakly to Go and satisfying Condition 
ID a.s.. Assume also that ^ (z,0) is continuous in 9, that there exists 77 > such that R23\) holds and that 
Dq is nonsingular. Let T be an M functional satisfying \20}) and weakly continuous at Gq. Then T is 
weakly differentiable at {G n } a.s. in any of the following two cases: (a) for each function d(z) such that 
Eq \d(z)\ < 00, on a set of probability one we have that {E Gii c?(z)} converges to E Go d(z), or (b) ^?{z,9) is 
bounded. 
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Corollary 2 Let {G n } be a sequence of empirical distributions associated to i.i.d. {zt} with distribution Go- 
Assume that ^ (z,6) is continuous in 9, that there exists 77 > such that \2S\) holds, that Dq is nonsingular 
and that It,G (z) has finite second moments under Gq. Let T be an M functional continuous at Gq. Then 
n 1 ' 2 (T (Gn) - T (Go)) ^ d N (0, V) with 

V = E Go I T>Go (z)I T , Go (z)'. (27) 



6 MM estimates 

In this Section wc will summarize the properties derived from Theorems 1-6 for S and MM estimates of 
regression and location. 

6.1 Regression case 

Recall that MM estimates, which we denote here by Tmm = {Tmm,p, Tmm,o), are defined in (fTT)|) . where S is 
the functional S defined in ^) with p\ < po, where we use p\ to denote the p- function employed in (jlll) . 
As mentioned above, the definition of £mm in © requires also a defined by ([6]), and hence also £s defined in 
(|S|). Therefore, these three estimates must be considered simultaneously. Call 



(|s,&im,5) (28) 



the joint solution of ©-©-©• 

In the remaining of this Section wc assume the following properties: 
R2. po and p\ are twice continuously differentiable 

We denote by ipo and ip% the derivatives of po and pi, respectively Assume also that g(x,f3) satisfies 
R3 g is twice continuously differentiable with respect to j3. 

We denote by g(x, £) and g (x, £) the vector of first derivatives and the matrix of second derivatives of g 
with respect to £, respectively Analogously we denote by g(x, (3) and g (x, fj) the vector of first derivatives 
and the matrix of second derivatives of g with respect to /3, respectively. 

Differentiating §5§ we have that (mm satisfies the system 

Ig^ ^-g^M ^^^O. (29) 

It is immediate that £s also satisfies 

£s = arg mm - } p 

i—1 

Then, differentiating this equation we get 



i=l \ 



Vi-g(xi,&) \ ., ? , f . 

9{Xi,£s) = 0. (30) 



Finally according to (|4]), a satisfies 



-fpo yi ~ g - { ^ s) )-S = 0. (31) 
n \ a 



<s 



Then 9 satisfies the system of 2q + 3 equations (|2l^ - (j3T^ - ([3"l"j) . Putting Zi = (xj, j/j) and denoting by G n the 
empirical distribution of {z\, .., z n }, this system can be written as 



1 n 



z,0) = 



(32) 



where if = (£s,£mm,c) , #) is defined by 



*(M) = 



j/-ff(z,£s) , \ . / > % 



Po ^ - S. 



MM, 



Let 



T(G) = (T S (G),T MM (G),S(G)) 



(33) 



be the estimating functional associated to Then, if ([23]) holds, we can differentiate the functions to be 
minimized in (|10[) and (|12j) inside the expectation, obtaining that T{G) satisfies the equation 



E G tf(*,T(G))=0. 



(34) 



Note that the solution to this equation is in general not unique, and therefore, T is not defined exclusively 
by this equation. 

To verify (|23)) . in addition to R0-R3 we need the following assumption: 
R4. For some r\ > 



E sup ||<ji(x, /3)|| < 00 and E sup ||<j(x,/3)|| < 00. 

W~M<*\ ' \\P-Po\\<V 



(35) 



Suppose that D defined by (|2"2"j) is non singular, then under these assumptions, we also get that It,g ( z ) has 
finite second moments under Go- Note that in the case of linear regression, (|35|) reduces to Ec \\x\\ < 00. 
Define 



(■ 



u — t 



a 0l = argminEFo/Oj ( tttttt I > i = 0, 1, 



(36) 



* \S{G ). 

where Fq is the distribution of Ui in model ([I]). We will see in Theorem [7] that under some general conditions, 
7s,a(G ) = aoo and Tmm, q (G ) = aoi.. 

Put 9q = (/3oi ctoO) A), aoij Co) with cro = S'(Go).The following numbers, vectors and matrices are required 
to derive a closed formula for the influence functions of T MM and Tg. Let 



/ ( V - g(x,/?o) - aoi 



, / u - a 0i 



. u - aoi \ , I u - aoi 
eoi = t^F 1 — ~ ) % 



dn = E f 



Co 

u - a ao 



<?0 

i = 0,l, 



0,1, 



Wo 



u - a 00 



o"o / \ cr 
6 = E Go 3(x,/3 ), 6g = 1)' 
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A = E Fo (g(x,(3 ) - bo)(g(x,/3 ) - b )' 

and 

" A + b b' b 



K 1 



Co 

It is shown in Section 17.51 that the influence function of Tmm is given by 



(37) 



J T MM ,^ ,g (a;, 2/) = — V>i = (ffC^i A) - b ) (38) 

a oi \ °o / 

and 

tTMM.oGoix^) = -01 1 + Mo (&o-0(z,/3 o )) 

a>oi \ °o / 

aoiao V V CT o / / 

The influence functions of Tg,^ and Tg jQ , can be obtained similarly replacing aoi,aoi and eoi by aoo, aoo 
and eoo respectively. 

If the errors itj have a symmetric distribution i*o, then eoi = and aoi = aoo = ceo, the center of 
symmetry of F . .This entails a considerable simplification of the influence function It M m- ln fact, in this 
case we get 

r / \ , ( y - g(x, fa) - ato)\ n -\., a , 
^,,6, 2 = ~ — 7777 w — vV'i Co 5 x,/3 ) , 

and the asymptotic covariancc matrix (p?7| is 



V- = a 2 E ^i((«-°°)M) c-i. (40) 
(EFoV'i ((" - ao)/o"o)) 

The next Theorem [7] summarizes the properties of S and MM regression functionals 

Theorem 7 Let z = (x, y) satisfy model {!]) where the distribution Fq of Ui has a strong unimodal density 
and the identifiability condition (0) holds. Assume that po and p\ are bounded p-functions that satisfy Rl, 
with p\{u) < po(u). Let T be defined by A33\) and Go the distribution of (x,y). Then, we have: 

(i) Ts(Go) = (Po,&oo) is the unique minimizer in ©. If Fq is symmetric with respect to po we have 

«oo = Mo. 

(ii) Tmm(Go) = (Ab^oi) is the unique minimizer in (|10[) . If Fq is symmetric with respect to po we have 

«oi = Mo- 

(iii) The functional T = (Tg, Tmm, S) is weakly continuous at Go if either (a) B is compact, or (b) B = R p , 
g(x, (i) = /3'x and 5 < 1 - c(G )- 

(iv) Assume also that R2. R3, R4 hold, that aoo 7^ 0, aoi 7^ 0, do 7^ and that Ao is invertible. Then, 
D = E Go ^ (z,T(G )) is invertible, lT UUtf> ,G (%,y) and L Tmm atGa (x,y) are given by (38]) and (05]), 
respectively, while the influence functions ir MM a, G (x, y) and -?t M m <*, g {. x i y) have a similar expression 
replacing aoi aoi and eoi by aoo aoo and eoo, respectively 
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(v) Under the same assumptions as in (iv), let {G„} be a sequence of random distributions converging 

weakly to Go and satisfying Condition [5] a.s.. Suppose also that for each function d(z) such that 
Eg \d(z)\ < oo, we have that {Eg„c?(z)} converges to Ec d(z) a.s.. Then, the functional T is weakly 
diffcrcntiable at {G n }. 

(vi) Assume the same conditions as in (iv) and that 

n 1 ' 2 I T ,G (G n )^ d H, (41) 

Then 

n 1/2 (T(G n ) - T(G )) = » 1/2 It,g (G„) + o p (l) (42) 

and therefore 

n^ 2 (T(G n )-T(G ))-^ d H. (43) 

(vii) Assume the same conditions as in (iv). Let G„ be the sequence of empirical distributions corresponding 
to i.i.d. observations {(xi, yi) : i > 1} with common distribution Go- Then (|4ip holds with H = N(0, V) 
and V given by (|2"?|). 

6.2 Location case 

The location model corresponds to the case where there are no regressors: p = q = and so yi — Ui and 
£ = a. If F denotes the common distribution of the m, then T(F ) = (Ts(F ), Tmm(-Fo), S(Fq)) is defined 
as in the regression case with <?(x,£) replaced by a. Then, the resulting Tmm = Jmm, a and Ts = Ts jQ are 
the location functionals while S is a functional estimating the error scale. In this case, ^Tmm.-Fo ^ s given by 

t i \ , f y ^oi \ 
It mm , Po (x) = — ^ {-^) 

- e -^( P J y -^)-s). (44) 

The following Theorem summarizes the properties of T that can be derived from the Theorems in the 
former sections. 

Theorem 8 Assume that po and pi are bounded p-functions that satisfy Rl, with pi < po- We assume that 
Fq has a strong unimodal density. Then 

(i) Ts(Fq) — aoo is the unique minimizcr in ([8]). If Fq> is symmetric with respect to po we have aoo = Mo. 

(ii) Tmm{Fq) = a oi is the unique minimizcr in (|10[) . If F is symmetric with respect to po we have a^i = Po- 

(iii) The functional T = (Ts,Tmm, S) is weakly continuous at F . 

(iv) Assume also that R2 holds and that aoo 7^ 0, aoi ^ 0, do ^ 0. Then, Do = E.p \E f (z, T(Fo)) is invertiblc, 
lT tBl f (y) is given by (|44|). The influence function It s ,f (y) has a similar expression replacing aoiaoi 
and eoi by aoo,«oo and eoo respectively. 

(v) Under the same assumptions as in (iv), let {F n } be a sequence of random distributions converging 

weakly to Fq and satisfying Condition [2] a.s.. Then T is a.s. weakly differentiable at {F n }. 
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(vi) Assume the same conditions as in (iv) and 

n^ 2 I T . Fo (F n )^ d H. (45) 

Then 

n^ 2 (T(F n ) - T(F )) = n^I T , Fo (F n ) + 0,(1), (46) 

and therefore 

7iV 2 (T(F„)-T(F ))^ d if. (47) 

(vii) Assume the same conditions as in (iv). Let {F„} be the sequence of empirical distributions corre- 
sponding to i.i.d. observations Ui with common distribution F . Then (|45j) holds with H = N(0,V) 
and V given by (|2T|) . 

If Fq is symmetric, the asymptotic variance of Tmm given by (|40[) becomes 



7 Proofs 

Before proving Theorems [1] and [5] we need some auxiliary results. 

Lemma 2 Consider distributions {G n } and Go on R p x R. Let and {o~ n } be sequences in B x R and 
R + respectively, such that £ n — > £ G B X R and a n — > a > 0. Assume that g{x,t;) is continuous in £. // 
G n — > w Gq, then 

'y-g(x,£ n )\ ^ fy-g(x,0* 



lim E Gn p = = E Go p 

n^oo V CT„ / 

Proof. Since G n —> w Go and p is continuous and bounded, we have 

E GnP( J ^E GoP 

and therefore it suffices to show that 

(y-g{x,i n )\ fy-g(x,0 

E G„P = - E G „p 



Since {G n } n >i is tight, it suffices to show that if V is a tight set of distributions of (x, y), then 

„ fy-g{x,£ n )\ (y-g{x,i) 

E F/0 = - E F p 



sup 

Fev V o-. 
To prove this, put z = (x, y). Then for all K > 



, y-fffo&ON (y-g{x,0 

E F p = - E F p 



(48) 



< 2E F 1{ || z ||>a:} + E F 



p{ — =z -p 



V {\\z\\<K}- 
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If ||z|| < K we have 



y-g(x,£n) y-g(x,0 



< [kn - a\\y\ + Wn - cr\\g(x,£)\ + <j\g(x,£ n ) - g{x,^)\]. 

aa n - - — 



(49) 



Now, given e > 0, we can find K such that 



2 sup PfGMI > K) <e/2, 



and a such that 



\p(u) - p(v)\ < e/2 if |u - v| < a. 
Then, we can choose uq such that the right-hand side of (|49|) is smaller than a if n > no and < K, and 



so from (|48|) we obtain for all n > no 

,'y-g{x,Z,n) 



-E F p 



y-g{x,0 



< e, vf e P. 



Lemma 3 Assume that B is closed and let Go be any distribution for (x, y) such that UU\) has a unique 
solution Tm(Go). Let {G n } be a sequence such that G n —t w Go and {Tj^(G n )} is bounded. If S(G n ) —} 
S(Go) > then T M (G n ) -> T M (G ). 



Proof Put for brevity 



£,n — 7m(G„), £o — Tm{Go), o- n — S(G n ), oo — S(Go)- 



(50) 



Since {C™} remains in a compact set, it suffices to prove that £o is the only accumulation point of {£„}. i.e., 
if a subsequence tends to some £, then £ = £o- Without loss of generality assume that £ n — >• £. The definition 
of £„ implies 



E G„P 

Using Lemma [2] we get 

M Go (£)=E Go p 



y - g{x,£n) 



y - g(x, Q 



<E Go/ o 



y - g(x^o) 



y - g(x,£o) 



(51) 



o-() 



M Go (io). 



Since £o is the only minimizer of M Ga , we conclude that £ = £ 



so- 



Lemma 4 Lei a^d I 17 "} ^ e sequences in R p+1 and respectively. Assume that when n — > oo, 

G n — >«, Gq, ||£n|| ~~ ^ o° {°Vi} * s bounded. Then 



lim inf Eg n /) 
where cq = c(Gq) is defined in M3\ ). 



y-C n (x',i) 



> 1 - c , 



(52) 
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Proof. Assume without loss of generality that there exist 7 € R p and a > such that for some 
subsequence 7„ = £„/||£ n || -> 7j and ov» < cr. Put A„ = ||£„||. 

For e > let d E be such that > 1 — e for |u| > d e . Therefore, 



2/-£>',i)' 



y-&(*M)' 



>(l-e)Pt 



|y-A„ 7 ;(^,l)'| 



> <L 



Then, to prove the Lemma, it suffices to show that 

y 



lim inf P<3„ 



A, 



> 



d e <j 

An 



> 1 - Co. 



Let (x n ,y n ) r~j G n and (xoj2/o) ~ ^o- Since A„ — > 00, we have y n /A„ — > p 0. Then the convergence of j n 
to 7 guarantees that 



A,, 



y n (x' n ,iy ^ d i'(x' ,iy 



For any a > which is a point of continuity of the distribution of {^'(xq, 1)|, A„ — > 00 implies 



lim inf Pg„ 

n— y 00 n 



-T-inix',!)' 



> 



d e a 

Am 



> lim inf Pr 



(l £ " 7 " (X '' 1 > °) = PGo (l7 ' (X '' 1/1 > a) • 



Letting a — > and recalling (fl"3|) we get 



lim inf P G „ ( JL-^',1)' 

An 



> -Y- 1 > 1 - co- 

An 



The proof of the following Lemma is similar to the one of Lemma |4l 

Lemma 5 Let {£ n } be a sequence in B x R, with B compact. Assume that when n — > 00, G„ — > w Go, 
||£„|| — > 00 and {o" n } is bounded. Then 



lim inf E Gri p 



2/ - ff(z,£n) 



(53) 



Finally, the following result we be used. 



Lemma 6 Let S(G) be defined by and suppose that S(Gq) > 0. Then, G n —t w Go implies that there 
exists no such that S(G n ) > for n > no- 

Proof: Suppose that the Lemma is not true. Then there exists a subsequence {G„ fc }fe>isuch that 
S(G nk ) = for all k. This means that giving e > 0, there exists {f3 nk , a nk ) such that 



E<3„. Pa 



y-g(x.,/3 nk ) -&n 



< S for any s > 



The same arguments that we use to prove Lemma 2] let us show that {(/3 nfc , «„ fc )} is bounded and therefore 
( passing to a subsequence if necessary) we can assume that (f3 nkl a nk ) — > (J3,a). Then, from Lemma [2] we 
get that 

y-g(x,P) - a 



< S for any s > 0. 



Then, S^Go) < S*(G Q ,j3, a) < s. Since this holds for any s > 0, we get that 5(Go) = 0. This contradicts 
the assumption that S'(Go) > 0. 



14 



7.1 Proof of Theorem \T\ 

Let G n — > w Go- Since S is weakly continuous at Go, it follows that S(G n ) — > S'(Go) > 0, by hypothesis. 

Case (a): We prove first that {Tm(G„)} is bounded. Suppose that it is not true; then without loss of 
generality we may assume that ||Tm(G„)|| — > oo. Then Lemma [5] implies 

1 = lim inf M G „(T M (G„)) < liminf M Gn (T M (G )) = M Go (T M (G )) , 

n— >oo n— >oo 

and this implies that M Go (£) = 1 for all £. This contradicts the assumption that Tm{Gq) is univocally 
defined. Then, {Tm(G„)} is bounded and from Lemma [3l we get that Tm(G„) — > Tm(Gq)- 

Case (b): Recall the notation in ([50)) . Convergence of {cr ra } guarantees that it is a bounded sequence. 
Suppose that {£„} is unbounded. Then, passing on to a subsequence if necessary, we may assume that 
||£„|| — > oo. In this case by Lemma 2] we have 

lim inf M Gn (£ n ) = lim inf E G „p ( V '~ ^ ^' ) > 1 - c . (54) 



We also have 



lim M Gn (£o) = lim E GnP I V ^ > l) ' J = M Go (£ ) <!-<*. (55) 



Inequalities (|54[) and (|55|) imply that there exists no such that for n > no 

M Gn (f n ) > M Gb (6), 

contradicting the definition of Tm(G„). Therefore is bounded, and then the weak continuity of Tm 
follows from Lemma [3] 

7.2 Proof of Theorem d 

Let G„ G , £n = T S (G„), Co = T S (G ), o n = S(G n ) and a = S{G ). We prove first that {a n } is 
bounded. Take any o\ > cr ; then by Lemma [5] 

GnP ° { % ) ~* GoP ° V oi J < 

and therefore there exists n such that 

S*{Zo,G n ) <ai for n > n 0l (56) 

which implies that S*(G n ,£o) is bounded and therefore a n < 5*(£o,G„) is also bounded. 
On the other hand, by Lemma [6j we get that <j n > for n large enough. 
We now prove that is bounded. In case (a) if is unbounded, Lemma [5] implies 

lim inf E GnPo ( y -°( X '^ \ > 1; (57) 

and this contradicts the fact that for all n 

E G „/9 = = $ < 1. 
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Consider now case (b) and assume that is unbounded . Then, passing on to a subsequence if 

necessary, wc may assume that ||£ n || — > oo. Then by Lemma 3] 

lim inf E G „p f y-&(*Mn > 1 CQi 
and this contradicts the fact that for all n 

Eg„Po = o < 1 - c . 



Then in case (b) is also bounded. 

We now show that a n — > oq. Suppose that this is not true. By passing on to a subsequence if necessary, 
we may assume that o~ n — > a* ^ o§ and £ n — > £* for some £* and a* . Since (|56[) holds for any a' > oo we 
have a* < <7q and therefore a* < oq. Then Lemma [2] implies 

, v „ fy-g(£ n ,x)\ „ (y-g{C,x) 

6 = hm E Gn p = = E Go p 



and therefore 5 (Go) < S'*(Go,C*) = o - * < fo- This contradicts the fact that S(Gq) = ao and shows that S 
is weakly continuous. 

Finally the weak continuity of Ts follows from (|12|) and Theorem [TJ 

7.3 Proofs of Theorems [3] and H] 

The following auxiliary result is due to Ibragimov (1956) 

Theorem 9 If f is a strongly unimodal density and (p is a density such that log tp is concave on its support, 
the convolution 



h{t) = j ip(u - t)f(u)du (58) 

J — oo 

is strongly unimodal. 

7.3.1 Proof of Theorem [3] 

(a) Put k = f_ p(x)dx and f(u) = (1 — p(u))/k, which vanishes for \u\ > m. Then 

q(t) = 1 - E F (1 - p(u -t)) = l- k~E F p{u -t) = l- kh(t), 

where h(t) is given by (|58|) . Since by Theorem[9]ft.(i) is a strongly unimodal density, part (a) of the Theorem 
follows 

(b) It is proved in Lemma 3.1 of Yohai (1985). 

7.3.2 Proof of Theorem H 

Without loss of generality we may assume a = 1. To prove the Theorem we will show that the unique 
minimum of R(/3, a) — E Go p(?/ — g(x, /3) — a) is /3 = /3q, a = to- We will first prove that 

R[Po,t ) < R(/3 ,a) for a ^ t . 
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This is equivalent to 

Ep a p{u — to) < ^F p( u ~ a ) f° r a 7^ 

which follows from Theorem [3] 

Consider now (/3, a) with /3 7^ /3 . Let A = {x : g(x 7 (3 ) = g(x 7 (3) + a — t n } and g as in (fl"5j) . with F 
replaced by Fo. Then 

i?(/3,a) = E Go {E Go [p(y ~ flC^jS) - a)N} 

= E Go {E Go \p{u + g{x,p )-g{x,p) - a)\x}}. (59) 

Since u and a; are independent we get 

E[p(u + g(x,{3 ) - g{x,0) - a)\x] = q(g(x,f3) - g(x,/3 ) + a). (60) 

Then according to Theorem [31 the left-hand side of (pi)) is equal to q(to) if x € A and grater than q(to) 
otherwise. The idcntifiability condition ([2]) implies that P(^4 C ) > and from (|59)l we get that R(/3, a) > q(to). 
Finally, the Theorem follows from the fact that i?(/?o, to) = q(to)- 

7.4 Proof of Theorems [5] and [6] 

7.4.1 Proof of Theorem [5] 

Since 

E Gn V(z,T(G n ))=0, 

the Mean Value Theorem together with Condition [2] and the consistency of T (G n ) yield 

E G „*(z,T(G )) + D (G n ,6* n ) (T(G n ) - T(G )) = 0, 

where 0* — > 6q. Then, (|25|l implies that D (G„, 0*) — > £>o and, since for large n, Z? (G„, 6**) is nonsingular, 
we may write 

T(G n ) - T(G Q ) = -D {G n ,9* n y l E Gn *(z,T(G )) 

= E G J T , Ga (z) + (d^ 1 - D {G^ei)- 1 ) E G J T)Go (z). 

Condition Q] implies that the second term of the right-hand side divided by HEg^/t.GoMII tends to zero, 
and this proves the Theorem. 

7.4.2 Proof of Theorem [6] 

Under the assumptions of this Theorem, we can prove that Condition [T] holds a.s. using the same arguments 
as in Lemma 4.2 of Yohai (1985). The only change is to replace the Law of Large Numbers for i.i.d. random 
variables by the assumption that E Gn d(z) — > E Go d{z) a.s. for all d such that Ec |<i(z)| < 00 in the case 
(a) and for the fact that E Gn d(z) — > E Go d(z) for all function d bounded and continuous in case (b). Then, 
Theorem [5] implies that T is weakly diffcrcntiable at {G n }. 
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7.5 Derivations of influence functions 
7.5.1 Derivation of ([38]) - (f39|) 

Put for brevity 



y-g(x,^uu) y-g{x,£,s) 

Emm = , ts — ■ 



Then 



where 



*22 
*23 
#31 
#33 



Z,9 



z,9 



#n(z,0) *is(M) 
* 22 M) * 2 s(M) 

L*3l(M) #33 M) 



Wo (*s) ffOc, £s)<?(z, £s)' + "00 (^s) 9 {x, £s) 



"00 (*s)*s£(ii^s) 

V4 (*mm) fl(a;, 5MM)ff(x, ^mm)' + ^i (*mm) 5 (z, £mm) 



(T 
1 

1 

17 

—w[ (*mm) tMMg(x, Cmm) 

£7 — 

— Vto (*s)sfo£s) 
cr — 

—Wo (*s)*s- 



1 

CO 



From (|6Tj) it is easy to show that 

Do=E Go *(Mo) 

Therefore |Do| = aooaoido|Cb| 2 . It follows from (|57|) | Co | 7^ if and on only if \Ao\ 7^ 0, and that 

(VM' l + 6 V^o . ' 



Co 1 



aooCo eoo^o 
a iC e i6o 
d 



Direct calculation shows that 
and the desired results follow from 11611 . 








7.5.2 Derivation of (j44l) 

In this case from (|5T|) . it is easy to show that 

Dn = — 



"01 °o 



do 1 



aoo 





eoo 





aoi 


eoi 








d 
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which implies 

dm 1 — eoiajji do 



a 00 U -e 00 a 00 rf 



d^ 1 



D^ 1 = -a 
The rest of the derivation is straightforward. 

7.6 Proof of Theorems [7] and [8] 
7.6.1 Proof of Theorem [7] 

Part (i) and (ii) follow from Theorem U and Remark [2J To prove (iii) , we need to check conditions of 
Theorem [1] and Theorem [21 We start showing that S(Gq) > 0. Let 

h (,\ TT ( Vi ~ 9{Xi, P) ~ <* 
ll0,a{S) = hpo I 

Then, we have 

lim hp, a (a) = Po (0) = (62) 

s— »oc 

and 

lim h/3. a (s) = 1 - P(yi = g{xi,fi) + a). (63) 

s— »0 

Since Ui has a continuous distribution and is independent of Xi, we also have 

P(y t = g(x i: P)+a)= P(g( Xi ,p ) + u t = g{x u p) + a) = E [P(u t = g(x t , /3) - g(x t , f3 Q ) + a)} = 0. (64) 

Equations ((62|), §3§ andflSH) imply that S*(G ,P,a) > for all (/3,a), and so 5(G ) = S*(G , A), a i) > 0. 
Note that 

y — ?mm(Go) 

S(G ) 
y-Ts(G ) 
S(G ) 
( f y-T s {G ) 

= s. 



M Go (T MM (G )) = E I pi 
< E U 



Then <5 < 1 — G(Go) implies (fT4"]) and from Theorem [5] follows that Tg and 5 are weakly continuous. Since 
S is weekly continuous Theorem Q] implies that Tmm is weakly continuous too, and so part (iii) follows. 
Part (iv) follows from the formulas obtained in Section 17.51 

(v) follows from part (a) of Theorem ^ while part (vi) follows from Lemma [T] Part (vii) follows from 
(vi) as was already shown before stating the Theorem. 

7.6.2 Proof of Theorem U 

It is completely similar to the proof of Theorem [7] The only differences arc that for part (iii) we use that in 
the case of a location model we have c (Go) = 0, and therefore condition (fT4)) reduces to Mg (Tm(Gq)) < 1. 
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Note that this inequality is implied by the condition that Tm(Go) is well defined. So, for this case, (fl"4")l 
always holds, and that for part (iv) we use part (b) of Theorem [5] instead of part (a). 
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